20 research outputs found

    Affine invariant visual phrases for object instance recognition

    Get PDF
    Object instance recognition approaches based on the bag-of-words model are severely affected by the loss of spatial consistency during retrieval. As a result, costly RANSAC verification is needed to ensure geometric consistency between the query and the retrieved images. A common alternative is to inject geometric informa- tion directly into the retrieval procedure, by endowing the visual words with additional information. Most of the existing approaches in this category can efficiently handle only restricted classes of geometric transfor- mations, including scale and translation. In this pa- per, we propose a simple and efficient scheme that can cover the more complex class of full affine transforma- tions. We demonstrate the usefulness of our approach in the case of planar object instance recognition, such as recognition of books, logos, traffic signs, etc.This work was funded by a Google Faculty Research Award, the Marie Curie grant CIG-334283-HRGP, a CNRS chaire d'excellence.This is the author accepted manuscript. The final version is available at http://dx.doi.org/10.1109/MVA.2015.715312

    A vertical stereoscopic system based on 1D image matching

    No full text
    Stereoscopic systems have proved their large applicability the last years in various domains: robotics, surveillance, 3D maps etc. Conceived as two-camera systems or as catadioptric mono-camera systems, these equipments rely on the possibility of gaining the third dimension of a scene by using two images of it, taken from different points of view. In this paper, we present an original stereoscopic system, as well as the dedicated image matching process. In order to respect timing constraints, we base the image matching on ID processing. We follow two steps in achieving the 3D reconstruction: first, we detect and match interest points, by using progressive Gaussian filtering and correlation measures, then we proceed to global matching, using the first step in order to improve accuracy. The results were validated using poor quality real images, taken from a cave, as part of a virtual visiting project

    Affine invariant visual phrases for object instance recognition

    No full text
    Object instance recognition approaches based on the bag-of-words model are severely affected by the loss of spatial consistency during retrieval. As a result, costly RANSAC verification is needed to ensure geometric consistency between the query and the retrieved images. A common alternative is to inject geometric information directly into the retrieval procedure, by endowing the visual words with additional information. Most of the existing approaches in this category can efficiently handle only restricted classes of geometric transformations, including scale and translation. In this paper, we propose a simple and efficient scheme that can cover the more complex class of full affine transformations. We demonstrate the usefulness of our approach in the case of planar object instance recognition, such as recognition of books, logos, traffic signs, etc

    A contrario patch matching, with an application to keypoint matches validation

    No full text
    We describe a simple metric for image patches similarity, together with a robust criterion for unsupervised patch matching. The gradient orientations at corresponding positions in the two patches are compared and the normalized errors are accumulated. Based on the a contrario framework, the matching criterion validates a match between two patches when this cumulative error is too small to have occurred as the result of an accidental agreement. The method is illustrated in the validation of keypoint matches

    "Bubble tag"-based system for object authentication

    No full text
    Biometric systems are omnipresent nowadays in fields that require user authentication (e.g.: access control, banking operations), due to the main attributes of the biometric characteristics: uniqueness, permanence, collectability. In the products world, a solution that could achieve the same performance as a biometric system could be represented by a "Bubble Tag"-based system. In this article we propose solutions for signature extraction and for "1 to many" authentication protocol applicable to the Bubble Tag. We briefly present a signature extraction method that is invariant under perspective. For the "1 to many" authentication protocol we recommend the use of an LSH (locality sensitive hashing) approach. Tests carried out on randomly computer-generated images gave promising results and indicated leads to be followed for real images. ©2010 IEEE

    Spatio-temporal video autoencoder with differentiable memory

    No full text
    We describe a new spatio-temporal video autoencoder, based on a classic spatial image autoencoder and a novel nested temporal autoencoder. The temporal encoder is represented by a differentiable visual memory composed of convolutional long short-term memory (LSTM) cells that integrate changes over time. Here we target motion changes and use as temporal decoder a robust optical flow prediction module together with an image sampler serving as built-in feedback loop. The architecture is end-to-end differentiable. At each time step, the system receives as input a video frame, predicts the optical flow based on the current observation and the LSTM memory state as a dense transformation map, and applies it to the current frame to generate the next frame. By minimising the reconstruction error between the predicted next frame and the corresponding ground truth next frame, we train the whole system to extract features useful for motion estimation without any supervision effort. We present one direct application of the proposed framework in weakly-supervised semantic segmentation of videos through label propagation using optical flow

    Scenenet: An annotated model generator for indoor scene understanding

    No full text
    © 2016 IEEE. We introduce Scenenet, a framework for generating high-quality annotated 3D scenes to aid indoor scene understanding. Scenenet leverages manually-annotated datasets of real world scenes such as nYUv2 to learn statistics about object co-occurrences and their spatial relationships. Using a hierarchical simulated annealing optimisation, these statistics are exploited to generate a potentially unlimited number of new annotated scenes, by sampling objects from various existing databases of 3D objects such as Modelnet, and textures such as OpenSurfaces and ArchiveTextures. Depending on the task, Scenenet can be used directly in the form of annotated 3D models for supervised training and 3D reconstruction benchmarking, or in the form of rendered annotated sequences of RGB-D frames or videos

    Massively parallel video networks

    No full text
    We introduce a class of causal video understanding models that aims to improve efficiency of video processing by maximising throughput, minimising latency, and reducing the number of clock cycles. Leveraging operation pipelining and multi-rate clocks, these models perform a minimal amount of computation (e.g. as few as four convolutional layers) for each frame per timestep to produce an output. The models are still very deep, with dozens of such operations being performed but in a pipelined fashion that enables depth-parallel computation. We illustrate the proposed principles by applying them to existing image architectures and analyse their behaviour on two video tasks: action recognition and human keypoint localisation. The results show that a significant degree of parallelism, and implicitly speedup, can be achieved with little loss in performance

    Detection of mirror-symmetric image patches

    No full text
    We propose a novel approach for detecting partial reflectional symmetry in images. Our method consists of two principal stages: candidate selection and validation. In the first step, candidates for mirror-symmetric patches are identified using an existing heuristic procedure based on Hough voting. The candidates are then validated using a principled statistical procedure inspired from the a contrario theory, which minimizes the number of false positives. Our algorithm uses integral image properties to enhance the execution time. © 2013 IEEE
    corecore